AITopics | directional convergence and alignment

Directional convergence and alignment in deep learning

Neural Information Processing SystemsDec-24-2025, 14:31:31 GMT

In this paper, we show that although the minimizers of cross-entropy and related classification losses are off at infinity, network weights learned by gradient flow converge in direction, with an immediate corollary that network predictions, training errors, and the margin distribution also converge. This proof holds for deep homogeneous networks -- a broad class of networks allowing for ReLU, max-pooling, linear, and convolutional layers -- and we additionally provide empirical support not just close to the theory (e.g., the AlexNet), but also on non-homogeneous networks (e.g., the DenseNet). If the network further has locally Lipschitz gradients, we show that these gradients also converge in direction, and asymptotically align with the gradient flow path, with consequences on margin maximization, convergence of saliency maps, and a few other settings. Our analysis complements and is distinct from the well-known neural tangent and mean-field theories, and in particular makes no requirements on network width and initialization, instead merely requiring perfect classification accuracy. The proof proceeds by developing a theory of unbounded nonsmooth Kurdyka-Łojasiewicz inequalities for functions definable in an o-minimal structure, and is also applicable outside deep learning.

deep learning, directional convergence and alignment, name change, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

c76e4b2fa54f8506719a5c0dc14c2eb9-AuthorFeedback.pdf

Neural Information Processing SystemsAug-16-2025, 09:39:15 GMT

directional convergence, lyu and li, reviewer, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.32)

Add feedback

Directional convergence and alignment in deep learning

Neural Information Processing SystemsMay-27-2025, 11:33:15 GMT

In this paper, we show that although the minimizers of cross-entropy and related classification losses are off at infinity, network weights learned by gradient flow converge in direction, with an immediate corollary that network predictions, training errors, and the margin distribution also converge. This proof holds for deep homogeneous networks -- a broad class of networks allowing for ReLU, max-pooling, linear, and convolutional layers -- and we additionally provide empirical support not just close to the theory (e.g., the AlexNet), but also on non-homogeneous networks (e.g., the DenseNet). If the network further has locally Lipschitz gradients, we show that these gradients also converge in direction, and asymptotically align with the gradient flow path, with consequences on margin maximization, convergence of saliency maps, and a few other settings. Our analysis complements and is distinct from the well-known neural tangent and mean-field theories, and in particular makes no requirements on network width and initialization, instead merely requiring perfect classification accuracy. The proof proceeds by developing a theory of unbounded nonsmooth Kurdyka-Łojasiewicz inequalities for functions definable in an o-minimal structure, and is also applicable outside deep learning.

artificial intelligence, directional convergence and alignment, machine learning, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.65)

Add feedback

Review for NeurIPS paper: Directional convergence and alignment in deep learning

Neural Information Processing SystemsFeb-5-2025, 20:23:27 GMT

Weaknesses: I have two main critiques on this work. The first relates to the significance of its results. In the setting studied, directional convergence, alignment and margin maximization have all been treated in several recent works (which the paper refers to). I know that at least in some of these works directional convergence and/or alignments were assumed (not proven), but nonetheless, my feeling is that the paper does not draw a sufficiently clear line separating itself from existing literature. For example, a very relevant existing work --- Lyu and Li 2019 --- is said to have left open the issues of directional convergence and alignment, but to my knowledge, that work does establish directional convergence, at least in some settings.

convergence and alignment, directional convergence, directional convergence and alignment, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

Review for NeurIPS paper: Directional convergence and alignment in deep learning

Neural Information Processing SystemsFeb-5-2025, 20:23:20 GMT

The reviewers agree that this paper is solving an important question with an interesting mathematical approach. The contributions, although technical, help to give rigorous justification to many works studying the behavior of neural networks or other models under gradient descent. Do not forget to update the paper as mentioned in the rebuttal.

deep learning, directional convergence and alignment, neurips paper

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

Directional convergence and alignment in deep learning

Neural Information Processing SystemsOct-11-2024, 07:57:16 GMT

In this paper, we show that although the minimizers of cross-entropy and related classification losses are off at infinity, network weights learned by gradient flow converge in direction, with an immediate corollary that network predictions, training errors, and the margin distribution also converge. This proof holds for deep homogeneous networks -- a broad class of networks allowing for ReLU, max-pooling, linear, and convolutional layers -- and we additionally provide empirical support not just close to the theory (e.g., the AlexNet), but also on non-homogeneous networks (e.g., the DenseNet). If the network further has locally Lipschitz gradients, we show that these gradients also converge in direction, and asymptotically align with the gradient flow path, with consequences on margin maximization, convergence of saliency maps, and a few other settings. Our analysis complements and is distinct from the well-known neural tangent and mean-field theories, and in particular makes no requirements on network width and initialization, instead merely requiring perfect classification accuracy. The proof proceeds by developing a theory of unbounded nonsmooth Kurdyka-Łojasiewicz inequalities for functions definable in an o-minimal structure, and is also applicable outside deep learning.

deep learning, directional convergence and alignment

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.65)

Add feedback